*Copyright Pierian Data 2017*
*For more information, visit us at www.pieriandata.com*

Stock Market Analysis Project

Please Note: You are free to treat this as a full exercise, or just view the solutions video as a code along project. This project is meant to be pretty challenging as it will introduce a few new concepts through some hints!

Welcome to your first capstone project! This project is meant to cap off the first half of the course, which mainly dealt with learning the libraries that we use in this course, the second half of the course will deal a lot more with quantitative trading techniques and platforms.

We'll be analyzing stock data related to a few car companies, from Jan 1 2012 to Jan 1 2017. Keep in mind that this project is mainly just to practice your skills with matplotlib, pandas, and numpy. Don't infer financial trading advice from the analysis we do here!

Part 0: Import

Import the various libraries you will need-you can always just come back up here or import as you go along :)

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pandas_datareader as web
import datetime
%matplotlib inline

Part 1: Getting the Data

Tesla Stock (Ticker: TSLA on the NASDAQ)

*Note! Not everyone will be working on a computer that will give them open access to download the stock information using pandas_datareader (firewalls, admin permissions, etc...). Because of this, the csv file for the Tesla is provided in a data folder inside this folder. It is called Tesla_Stock.csv. Feel free to just use this with read_csv!

Use pandas_datareader to obtain the historical stock information for Tesla from Jan 1, 2012 to Jan 1, 2017.

In [2]:
beg_dt = datetime.datetime(2012,1,1)
end_dt = datetime.datetime(2017,1,1)
In [3]:
stk_tsla = web.DataReader('TSLA', 'yahoo', beg_dt, end_dt)
In [9]:
stk_tsla.head()
Out[9]:
High Low Open Close Volume Adj Close
Date
2012-01-03 29.500000 27.650000 28.940001 28.080000 928100 28.080000
2012-01-04 28.670000 27.500000 28.209999 27.709999 630100 27.709999
2012-01-05 27.930000 26.850000 27.760000 27.120001 1005500 27.120001
2012-01-06 27.790001 26.410000 27.200001 26.910000 986300 26.910000
2012-01-09 27.490000 26.120001 27.000000 27.250000 897000 27.250000
In [5]:
 
Out[5]:
Open High Low Close Volume
Date
2012-01-03 28.94 29.50 27.65 28.08 928052
2012-01-04 28.21 28.67 27.50 27.71 630036
2012-01-05 27.76 27.93 26.85 27.12 1005432
2012-01-06 27.20 27.79 26.41 26.89 687081
2012-01-09 27.00 27.49 26.12 27.25 896951
In [6]:
 

Other Car Companies

Repeat the same steps to grab data for Ford and GM (General Motors),

In [4]:
stk_f = web.DataReader('F', 'yahoo', beg_dt, end_dt)
stk_f.head()
Out[4]:
High Low Open Close Volume Adj Close
Date
2012-01-03 11.25 10.99 11.00 11.13 45709900.0 7.823252
2012-01-04 11.53 11.07 11.15 11.30 79725200.0 7.942743
2012-01-05 11.63 11.24 11.33 11.59 67877500.0 8.146585
2012-01-06 11.80 11.52 11.74 11.71 59840700.0 8.230934
2012-01-09 11.95 11.70 11.83 11.80 53981500.0 8.294192
In [8]:
 
Out[8]:
Open High Low Close Volume
Date
2012-01-03 11.00 11.25 10.99 11.13 45709811
2012-01-04 11.15 11.53 11.07 11.30 79725188
2012-01-05 11.33 11.63 11.24 11.59 67877467
2012-01-06 11.74 11.80 11.52 11.71 59840605
2012-01-09 11.83 11.95 11.70 11.80 53981467
In [5]:
stk_gm = web.DataReader('GM', 'yahoo', beg_dt, end_dt)
stk_gm.head()
Out[5]:
High Low Open Close Volume Adj Close
Date
2012-01-03 21.180000 20.750000 20.830000 21.049999 9321300.0 16.103352
2012-01-04 21.370001 20.750000 21.049999 21.150000 7856700.0 16.179853
2012-01-05 22.290001 20.959999 21.100000 22.170000 17880600.0 16.960161
2012-01-06 23.030001 22.240000 22.260000 22.920000 18234500.0 17.533915
2012-01-09 23.430000 22.700001 23.200001 22.840000 12084500.0 17.472712
In [10]:
 
Out[10]:
Open High Low Close Volume
Date
2012-01-03 20.83 21.18 20.75 21.05 9321420
2012-01-04 21.05 21.37 20.75 21.15 7856752
2012-01-05 21.10 22.29 20.96 22.17 17884040
2012-01-06 22.26 23.03 22.24 22.92 18234608
2012-01-09 23.20 23.43 22.70 22.84 12091714
In [11]:
 

Part 2: Visualizing the Data

Time to visualize the data.

Follow along and recreate the plots below according to the instructions and explanations.


Recreate this linear plot of all the stocks' Open price ! Hint: For the legend, use label parameter and plt.legend()

In [22]:
# Code Here
# dates not formatted the same way.  Need major divider at 6 months with %Y-%m format
df = pd.DataFrame(stk_tsla['Open'])
df.rename(columns={'Open' : 'Tesla'}, inplace=True)
df['GM'] = stk_gm['Open']
df['Ford'] = stk_f['Open']
df.plot(figsize=(16,7), title='Open Price');
In [10]:
# Jose did it much simpler without making a dataframe
stk_tsla['Close'].plot(label='Tesla', figsize=(16,8), title='Open Price')
stk_gm['Close'].plot(label='GM')
stk_f['Close'].plot(label='Ford')
# you can hide the <matplotlib.legend.....> stuff by putting a semicolon after the below line
plt.legend();
In [51]:
 
Out[51]:
<matplotlib.legend.Legend at 0x2145b62dfd0>

Plot the Volume of stock traded each day.

In [7]:
df2 = pd.DataFrame(stk_tsla['Volume'])
df2.rename(columns={'Volume': 'Tesla'}, inplace=True)
df2['GM'] = stk_gm['Volume']
df2['Ford'] = stk_f['Volume']
df2.plot(figsize=(16,7), title='Volume Traded')
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7c9a4c3750>
In [12]:
# again, this can be done simply without making a new dataframe
stk_tsla['Volume'].plot(label='Tesla', figsize=(16,7), title='Volume Traded')
stk_gm['Volume'].plot(label='GM')
stk_f['Volume'].plot(label='Ford')
plt.legend();
In [14]:
 
Out[14]:
<matplotlib.legend.Legend at 0x214543b3828>

Interesting, looks like Ford had a really big spike somewhere in late 2013. What was the date of this maximum trading volume for Ford?

Bonus: What happened that day?

In [54]:
# there has to be an easier way to do this
df2.index.values[df2['Ford'].argmax()]
Out[54]:
numpy.datetime64('2013-12-18T00:00:00.000000000')
In [14]:
# OK, the answer was wrong in Jose's video because it only gave the line number
# this was the answer: stk_f['Volume'].argmax()
# therefore it's important to pull the index value of that line number using stk_f.index.values[]
stk_f.index.values[stk_f['Volume'].argmax()]
Out[14]:
numpy.datetime64('2013-12-18T00:00:00.000000000')
In [16]:
 
Out[16]:
Timestamp('2013-12-18 00:00:00')

The Open Price Time Series Visualization makes Tesla look like its always been much more valuable as a company than GM and Ford. But to really understand this we would need to look at the total market cap of the company, not just the stock price. Unfortunately our current data doesn't have that information of total units of stock present. But what we can do as a simple calcualtion to try to represent total money traded would be to multply the Volume column by the Open price. Remember that this still isn't the actual Market Cap, its just a visual representation of the total amount of money being traded around using the time series. (e.g. 100 units of stock at \$10 each versus 100000 units of stock at $1 each)

Create a new column for each dataframe called "Total Traded" which is the Open Price multiplied by the Volume Traded.

In [15]:
# Code Here 
stk_tsla['Total Traded'] = stk_tsla['Open'] * stk_tsla['Volume']
stk_gm['Total Traded'] = stk_gm['Open'] * stk_gm['Volume']
stk_f['Total Traded'] = stk_f['Open'] * stk_f['Volume']
In [19]:
 

Plot this "Total Traded" against the time index.

In [9]:
# Code here
df3 = pd.DataFrame(stk_tsla['Total Traded'])
df3.rename(columns={'Total Traded' : 'Tesla'}, inplace=True)
df3['GM'] = stk_gm['Total Traded']
df3['Ford'] = stk_f['Total Traded']
ax = df3.plot(figsize=(16,7))
ax.set_ylabel('Total Traded')
Out[9]:
Text(0, 0.5, 'Total Traded')
In [20]:
# again, making a new dataframe is not necessary - he didn't focus on the y axis label
stk_tsla['Total Traded'].plot(label='Tesla', figsize=(16,7), title='Total Traded')
stk_gm['Total Traded'].plot(label='GM')
stk_f['Total Traded'].plot(label='Ford')
plt.legend();
In [58]:
 
Out[58]:
<matplotlib.text.Text at 0x2145f5b9c88>

Interesting, looks like there was huge amount of money traded for Tesla somewhere in early 2014. What date was that and what happened?

In [70]:
df3.index.values[df3['Tesla'].argmax()]
Out[70]:
numpy.datetime64('2014-02-25T00:00:00.000000000')
In [21]:
 
Out[21]:
Timestamp('2014-02-25 00:00:00')

Let's practice plotting out some MA (Moving Averages). Plot out the MA50 and MA200 for GM.

In [77]:
# Code here
stk_gm['MA50'] = stk_gm['Open'].rolling(window=50).mean()
stk_gm['MA200'] = stk_gm['Open'].rolling(window=200).mean()
stk_gm[['Open', 'MA50', 'MA200']].plot(figsize=(16,8))
Out[77]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f114fc64610>
In [50]:
 
Out[50]:
<matplotlib.axes._subplots.AxesSubplot at 0x2145e911518>

Finally lets see if there is a relationship between these stocks, after all, they are all related to the car industry. We can see this easily through a scatter matrix plot. Import scatter_matrix from pandas.plotting and use it to create a scatter matrix plot of all the stocks'opening price. You may need to rearrange the columns into a new single dataframe. Hints and info can be found here: https://pandas.pydata.org/pandas-docs/stable/visualization.html#scatter-matrix-plot

In [28]:
from pandas.plotting import scatter_matrix
In [87]:
scatter_matrix(df, figsize=(8,8), hist_kwds={'bins':50})
Out[87]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f1151c95710>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f114eaaddd0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f114ea54a90>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x7f114ea12e10>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f114e9c8ad0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f114e988e50>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x7f114e93db10>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f114e8fc350>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f114e8fce90>]],
      dtype=object)
In [31]:
# in making the dataframe df, I worked too hard here's an easy way to assemble the dataframe in two lines
# this line is wrong because it passes the data as series:
# car_comp = pd.concat(stk_tsla['Open'], stk_gm['Open'], stk_f['Open'])
# instead, enclose the series in brackets and specify axis=1
car_comp = pd.concat([stk_tsla['Open'], stk_gm['Open'], stk_f['Open']], axis=1)
car_comp.columns = ['Tesla Open', 'GM Open', 'Ford Open']

# then just graph the dataframe 
# BTW, the bins is passed in as a dictionary hence the brackets bins is the key
scatter_matrix(car_comp, figsize=(8,8), alpha = 0.2, hist_kwds={'bins' : 50});
In [76]:
 

Bonus Visualization Task! (Note: This is hard!)

Let's now create a candlestick chart! Watch the video if you get stuck on trying to recreate this visualization, there are quite a few steps involved!Refer to the video to understand how to interpret and read this chart. Hints: https://matplotlib.org/examples/pylab_examples/finance_demo.html

Create a CandleStick chart for Ford in January 2012 (too many dates won't look good for a candlestick chart)

In [10]:
f_beg_date = datetime.datetime(2012,1,1)
f_end_date = datetime.datetime(2012,2,1)
stk_f.head()
Out[10]:
High Low Open Close Volume Adj Close Total Traded
Date
2012-01-03 11.25 10.99 11.00 11.13 45709900.0 7.823252 5.028089e+08
2012-01-04 11.53 11.07 11.15 11.30 79725200.0 7.942743 8.889359e+08
2012-01-05 11.63 11.24 11.33 11.59 67877500.0 8.146585 7.690521e+08
2012-01-06 11.80 11.52 11.74 11.71 59840700.0 8.230934 7.025298e+08
2012-01-09 11.95 11.70 11.83 11.80 53981500.0 8.294192 6.386011e+08
In [11]:
# create filtered dataframe
df4 = stk_f[(stk_f.index >= f_beg_date) & (stk_f.index < f_end_date)]
In [12]:
df4
Out[12]:
High Low Open Close Volume Adj Close Total Traded
Date
2012-01-03 11.25 10.99 11.00 11.13 45709900.0 7.823252 5.028089e+08
2012-01-04 11.53 11.07 11.15 11.30 79725200.0 7.942743 8.889359e+08
2012-01-05 11.63 11.24 11.33 11.59 67877500.0 8.146585 7.690521e+08
2012-01-06 11.80 11.52 11.74 11.71 59840700.0 8.230934 7.025298e+08
2012-01-09 11.95 11.70 11.83 11.80 53981500.0 8.294192 6.386011e+08
2012-01-10 12.05 11.63 12.00 11.80 121750600.0 8.294192 1.461007e+09
2012-01-11 12.18 11.65 11.74 12.07 63806000.0 8.483976 7.490824e+08
2012-01-12 12.18 11.89 12.16 12.14 48687700.0 8.533181 5.920424e+08
2012-01-13 12.08 11.84 12.01 12.04 46366700.0 8.462890 5.568641e+08
2012-01-17 12.26 11.96 12.20 12.02 44398400.0 8.448833 5.416605e+08
2012-01-18 12.37 12.00 12.03 12.34 47102700.0 8.673759 5.666455e+08
2012-01-19 12.72 12.43 12.48 12.61 70894200.0 8.863540 8.847596e+08
2012-01-20 12.64 12.45 12.55 12.59 43705700.0 8.849483 5.485065e+08
2012-01-23 12.84 12.55 12.69 12.66 49379700.0 8.898687 6.266284e+08
2012-01-24 12.86 12.46 12.56 12.82 45768400.0 9.011153 5.748511e+08
2012-01-25 12.98 12.70 12.80 12.93 54021600.0 9.088470 6.914765e+08
2012-01-26 13.05 12.66 13.03 12.79 75470700.0 8.990064 9.833832e+08
2012-01-27 12.53 11.79 11.96 12.21 142155300.0 8.616063 1.700177e+09
2012-01-30 12.44 12.00 12.06 12.29 57752600.0 8.672517 6.964964e+08
2012-01-31 12.51 12.20 12.47 12.42 46412100.0 8.764253 5.787589e+08
In [13]:
# candlestick chart done via plotly
import plotly.graph_objects as go
fig = go.Figure(data=[go.Candlestick(x=df4.index,
                                   open=df4['Open'],
                                   high=df4['High'],
                                   low=df4['Low'],
                                   close=df4['Close'])])
fig.show()
In [16]:
# use the matplotlib libraries for candlesticks
import mplfinance as mpf

# the default is bar charts - the type code for this is 'ohlc' - default for volume is False
mpf.plot(df4)
In [17]:
# use the matplotlib libraries for candlesticks
import mplfinance as mpf

# the default is bar charts - the type code for this is 'ohlc'- here's with volume on
mpf.plot(df4, volume=True)
In [18]:
# use the matplotlib libraries for candlesticks (needed to pip install this library)
import mplfinance as mpf

# here is the candlestick format
mpf.plot(df4, type='candle')
In [19]:
# use the matplotlib libraries for candlesticks (needed to pip install this library)
import mplfinance as mpf

# here is the candlestick format with volume
mpf.plot(df4, type='candle', volume=True)
In [44]:
import mplfinance as mpf
from matplotlib.dates import date2num, DateFormatter, WeekdayLocator, DayLocator, MONDAY
# here is an easier way to get a filtered dataframe for just that January's data
# also the .reset_index() moves the Date field out of the index to be just a field but still a datetime
ford_reset = stk_f.loc['2012-01'].reset_index()
ford_reset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   Date          20 non-null     datetime64[ns]
 1   High          20 non-null     float64       
 2   Low           20 non-null     float64       
 3   Open          20 non-null     float64       
 4   Close         20 non-null     float64       
 5   Volume        20 non-null     float64       
 6   Adj Close     20 non-null     float64       
 7   Total Traded  20 non-null     float64       
dtypes: datetime64[ns](1), float64(7)
memory usage: 1.4 KB
In [45]:
# calculate a new column converting the Date field to be a number
ford_reset['date_ax'] = ford_reset['Date'].apply(lambda date : date2num(date))
ford_reset.head()
Out[45]:
Date High Low Open Close Volume Adj Close Total Traded date_ax
0 2012-01-03 11.25 10.99 11.00 11.13 45709900.0 7.823252 5.028089e+08 734505.0
1 2012-01-04 11.53 11.07 11.15 11.30 79725200.0 7.942743 8.889359e+08 734506.0
2 2012-01-05 11.63 11.24 11.33 11.59 67877500.0 8.146585 7.690521e+08 734507.0
3 2012-01-06 11.80 11.52 11.74 11.71 59840700.0 8.230934 7.025298e+08 734508.0
4 2012-01-09 11.95 11.70 11.83 11.80 53981500.0 8.294192 6.386011e+08 734511.0
In [46]:
# this is the order of fields expected for the box plot
list_of_cols = ['date_ax', 'Open', 'High', 'Low', 'Close']
ford_values = [tuple(vals) for vals in ford_reset[list_of_cols].values ]
# the above makes a bunch of tuple values where each tuple is a row of the column values
ford_values
Out[46]:
[(734505.0, 11.0, 11.25, 10.989999771118164, 11.130000114440918),
 (734506.0,
  11.149999618530273,
  11.529999732971191,
  11.069999694824219,
  11.300000190734863),
 (734507.0,
  11.329999923706055,
  11.630000114440918,
  11.239999771118164,
  11.59000015258789),
 (734508.0,
  11.739999771118164,
  11.800000190734863,
  11.520000457763672,
  11.710000038146973),
 (734511.0,
  11.829999923706055,
  11.949999809265137,
  11.699999809265137,
  11.800000190734863),
 (734512.0, 12.0, 12.050000190734863, 11.630000114440918, 11.800000190734863),
 (734513.0,
  11.739999771118164,
  12.180000305175781,
  11.649999618530273,
  12.069999694824219),
 (734514.0,
  12.15999984741211,
  12.180000305175781,
  11.890000343322754,
  12.140000343322754),
 (734515.0,
  12.010000228881836,
  12.079999923706055,
  11.84000015258789,
  12.039999961853027),
 (734519.0,
  12.199999809265137,
  12.260000228881836,
  11.960000038146973,
  12.020000457763672),
 (734520.0, 12.029999732971191, 12.369999885559082, 12.0, 12.34000015258789),
 (734521.0,
  12.479999542236328,
  12.720000267028809,
  12.430000305175781,
  12.609999656677246),
 (734522.0,
  12.550000190734863,
  12.640000343322754,
  12.449999809265137,
  12.59000015258789),
 (734525.0,
  12.6899995803833,
  12.84000015258789,
  12.550000190734863,
  12.65999984741211),
 (734526.0,
  12.5600004196167,
  12.859999656677246,
  12.460000038146973,
  12.819999694824219),
 (734527.0,
  12.800000190734863,
  12.979999542236328,
  12.699999809265137,
  12.930000305175781),
 (734528.0,
  13.029999732971191,
  13.050000190734863,
  12.65999984741211,
  12.789999961853027),
 (734529.0,
  11.960000038146973,
  12.529999732971191,
  11.789999961853027,
  12.210000038146973),
 (734532.0, 12.0600004196167, 12.4399995803833, 12.0, 12.289999961853027),
 (734533.0,
  12.470000267028809,
  12.510000228881836,
  12.199999809265137,
  12.420000076293945)]
In [54]:
# all of Jose's work above is no longer necessary because the new mplfinance library does it all automatically
# https://github.com/matplotlib/mplfinance
# mpf.plot(data)
# where data is a Pandas DataFrame object containing Open, High, Low and Close data, with a Pandas DatetimeIndex
ford_reset2 = stk_f.loc['2012-01']
mpf.plot(ford_reset2, type='candle')
In [128]:
 

Part 3: Basic Financial Analysis

Now it is time to focus on a few key financial calculations. This will serve as your transition to the second half of the course. All you need to do is follow along with the instructions, this will mainly be an exercise in converting a mathematical equation or concept into code using python and pandas, something we will do often when working with quantiative data! If you feel very lost in this section, don't worry! Just go to the solutions lecture and treat it as a code-along lecture, use whatever style of learning works best for you!

Let's begin!


Daily Percentage Change

First we will begin by calculating the daily percentage change. Daily percentage change is defined by the following formula:

$ r_t = \frac{p_t}{p_{t-1}} -1$

This defines r_t (return at time t) as equal to the price at time t divided by the price at time t-1 (the previous day) minus 1. Basically this just informs you of your percent gain (or loss) if you bought the stock on day and then sold it the next day. While this isn't necessarily helpful for attempting to predict future values of the stock, its very helpful in analyzing the volatility of the stock. If daily returns have a wide distribution, the stock is more volatile from one day to the next. Let's calculate the percent returns and then plot them with a histogram, and decide which stock is the most stable!

Create a new column for each dataframe called returns. This column will be calculated from the Close price column. There are two ways to do this, either a simple calculation using the .shift() method that follows the formula above, or you can also use pandas' built in pct_change method.

Tesla Calculations

In [24]:
stk_tsla['returns'] = (stk_tsla['Close'] / stk_tsla.shift(periods=1)['Close'])-1
stk_tsla.head()
Out[24]:
High Low Open Close Volume Adj Close Total Traded returns
Date
2012-01-03 29.500000 27.650000 28.940001 28.080000 928100 28.080000 2.685921e+07 NaN
2012-01-04 28.670000 27.500000 28.209999 27.709999 630100 27.709999 1.777512e+07 -0.013177
2012-01-05 27.930000 26.850000 27.760000 27.120001 1005500 27.120001 2.791268e+07 -0.021292
2012-01-06 27.790001 26.410000 27.200001 26.910000 986300 26.910000 2.682736e+07 -0.007743
2012-01-09 27.490000 26.120001 27.000000 27.250000 897000 27.250000 2.421900e+07 0.012635
In [56]:
# another way to do this is to use the built-in method in pandas to calculate this field
stk_tsla['returns'] = stk_tsla['Close'].pct_change(1)
stk_tsla.head()
Out[56]:
High Low Open Close Volume Adj Close Total Traded returns
Date
2012-01-03 29.500000 27.650000 28.940001 28.080000 928100 28.080000 2.685921e+07 NaN
2012-01-04 28.670000 27.500000 28.209999 27.709999 630100 27.709999 1.777512e+07 -0.013177
2012-01-05 27.930000 26.850000 27.760000 27.120001 1005500 27.120001 2.791268e+07 -0.021292
2012-01-06 27.790001 26.410000 27.200001 26.910000 986300 26.910000 2.682736e+07 -0.007743
2012-01-09 27.490000 26.120001 27.000000 27.250000 897000 27.250000 2.421900e+07 0.012635
In [134]:
 
Out[134]:
Open High Low Close Volume Total Traded returns
Date
2012-01-03 28.94 29.50 27.65 28.08 928052 26857824.88 NaN
2012-01-04 28.21 28.67 27.50 27.71 630036 17773315.56 -0.013177
2012-01-05 27.76 27.93 26.85 27.12 1005432 27910792.32 -0.021292
2012-01-06 27.20 27.79 26.41 26.89 687081 18688603.20 -0.008481
2012-01-09 27.00 27.49 26.12 27.25 896951 24217677.00 0.013388

Ford Calculations

In [25]:
stk_f['returns'] = (stk_f['Close'] / stk_f.shift(periods=1)['Close'])-1
stk_f.head()
Out[25]:
High Low Open Close Volume Adj Close Total Traded returns
Date
2012-01-03 11.25 10.99 11.00 11.13 45709900.0 7.823252 5.028089e+08 NaN
2012-01-04 11.53 11.07 11.15 11.30 79725200.0 7.942743 8.889359e+08 0.015274
2012-01-05 11.63 11.24 11.33 11.59 67877500.0 8.146585 7.690521e+08 0.025664
2012-01-06 11.80 11.52 11.74 11.71 59840700.0 8.230934 7.025298e+08 0.010354
2012-01-09 11.95 11.70 11.83 11.80 53981500.0 8.294192 6.386011e+08 0.007686
In [57]:
stk_f['returns'] = stk_f['Close'].pct_change(1)
stk_gm['returns'] = stk_gm['Close'].pct_change(1)
In [59]:
stk_tsla['returns']  # single brackets return a series
stk_tsla[['returns']]  # double brackets returns a dataframe
Out[59]:
returns
Date
2012-01-03 NaN
2012-01-04 -0.013177
2012-01-05 -0.021292
2012-01-06 -0.007743
2012-01-09 0.012635
... ...
2016-12-23 0.023459
2016-12-27 0.029015
2016-12-28 0.000957
2016-12-29 -0.023027
2016-12-30 -0.004611

1258 rows × 1 columns

In [138]:
 
Out[138]:
Open High Low Close Volume Total Traded mpl_time returns
Date
2012-01-03 11.00 11.25 10.99 11.13 45709811 5.028079e+08 NaN NaN
2012-01-04 11.15 11.53 11.07 11.30 79725188 8.889358e+08 NaN 0.015274
2012-01-05 11.33 11.63 11.24 11.59 67877467 7.690517e+08 NaN 0.025664
2012-01-06 11.74 11.80 11.52 11.71 59840605 7.025287e+08 NaN 0.010354
2012-01-09 11.83 11.95 11.70 11.80 53981467 6.386008e+08 NaN 0.007686

GM Calculations

In [28]:
stk_gm['MA50'] = stk_gm['Close'].rolling(window=50).mean()
stk_gm['MA200'] = stk_gm['Close'].rolling(200).mean()
stk_gm['returns'] = (stk_gm['Close'] / stk_gm.shift(periods=1)['Close'])-1
stk_gm.head()
Out[28]:
High Low Open Close Volume Adj Close Total Traded returns MA50 MA200
Date
2012-01-03 21.180000 20.750000 20.830000 21.049999 9321300.0 16.103352 1.941627e+08 NaN NaN NaN
2012-01-04 21.370001 20.750000 21.049999 21.150000 7856700.0 16.179853 1.653835e+08 0.004751 NaN NaN
2012-01-05 22.290001 20.959999 21.100000 22.170000 17880600.0 16.960161 3.772807e+08 0.048227 NaN NaN
2012-01-06 23.030001 22.240000 22.260000 22.920000 18234500.0 17.533915 4.059000e+08 0.033829 NaN NaN
2012-01-09 23.430000 22.700001 23.200001 22.840000 12084500.0 17.472712 2.803604e+08 -0.003490 NaN NaN
In [139]:
 
Out[139]:
Open High Low Close Volume Total Traded MA50 MA200 returns
Date
2012-01-03 20.83 21.18 20.75 21.05 9321420 1.941652e+08 NaN NaN NaN
2012-01-04 21.05 21.37 20.75 21.15 7856752 1.653846e+08 NaN NaN 0.004751
2012-01-05 21.10 22.29 20.96 22.17 17884040 3.773532e+08 NaN NaN 0.048227
2012-01-06 22.26 23.03 22.24 22.92 18234608 4.059024e+08 NaN NaN 0.033829
2012-01-09 23.20 23.43 22.70 22.84 12091714 2.805278e+08 NaN NaN -0.003490

Now plot a histogram of each companies returns. Either do them separately, or stack them on top of each other. Which stock is the most "volatile"? (as judged by the variance in the daily returns we will discuss volatility in a lot more detail in future lectures.)

In [34]:
stk_f['returns'].hist(bins=50)
Out[34]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7c926bc2d0>
In [141]:
 
Out[141]:
<matplotlib.axes._subplots.AxesSubplot at 0x21466bb6f98>
In [35]:
stk_gm['returns'].hist(bins=50)
Out[35]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7c92572890>
In [142]:
 
Out[142]:
<matplotlib.axes._subplots.AxesSubplot at 0x21466c7e710>
In [36]:
stk_tsla['returns'].hist(bins=50)
Out[36]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7c92580850>
In [143]:
 
Out[143]:
<matplotlib.axes._subplots.AxesSubplot at 0x21466ea6048>
In [58]:
# assemble dataframe
df5 = pd.DataFrame(stk_tsla['returns'])
df5.rename(columns={'returns' : 'Tesla'}, inplace=True)
df5['GM'] = stk_gm['returns']
df5['Ford'] = stk_f['returns']

# chart dataframe
fig = plt.figure(figsize=(12,6))

plt.hist(df5['Tesla'], bins=100, alpha=0.5)
plt.hist(df5['GM'], bins=100, alpha=0.5)
plt.hist(df5['Ford'], bins=100, alpha=0.5)
plt.grid()
plt.legend(df5)
Out[58]:
<matplotlib.legend.Legend at 0x7f7c8d74ec90>
In [63]:
# the above method is way too much work - a simpler way to do this is the following
stk_tsla['returns'].hist(bins=100, figsize=(12,6), label='Tesla', alpha=0.5)
stk_gm['returns'].hist(bins=100, label='GM', alpha=0.5)
stk_f['returns'].hist(bins=100, label='Ford', alpha=0.5)
plt.legend();  # this last piece turns on the legend with the labels and the ";" supresses the code above the chart

# the purpose of the histogram is to compare the VOLATILITY of returns between stocks Tesla as most, Ford as least
In [150]:
 
Out[150]:
<matplotlib.legend.Legend at 0x21469288978>

Try also plotting a KDE instead of histograms for another view point. Which stock has the widest plot?

In [63]:
# chart dataframe (dataframe assembled already above)
df5.plot.density(figsize=(12,6))
Out[63]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7c8ade6750>
In [66]:
# again, this took work to assemble the dataframe - here's how to do it simpler
stk_tsla['returns'].plot.density(figsize=(12,6), label='Tesla')
stk_gm['returns'].plot.density(label='GM')
stk_f['returns'].plot.density(label='Ford')
plt.legend();

# the purpose of this chart is to compare the STABILITY of the different stocks with Ford being the highest and 
# Tesla being the lowest in terms of density of data
In [156]:
 
Out[156]:
<matplotlib.legend.Legend at 0x2146a1dde10>

Try also creating some box plots comparing the returns.

In [69]:
# there is no easier way to do box plots so all of the columns need to be in a single dataframe
df5.plot.box(figsize=(8,12)).set_xlabel('Returns')
Out[69]:
Text(0.5, 0, 'Returns')
In [74]:
# perhaps an easier way to assemble the dataframe though - two lines
box_df = pd.concat([stk_tsla['returns'], stk_gm['returns'], stk_f['returns']], axis=1)
box_df.columns = ['Tesla Returns', 'GM Returns', 'Ford Returns']
box_df.plot.box(figsize=(8,12));

# this is another way to analyze stocks to show the volatility between the stocks
In [205]:
 
Out[205]:
<matplotlib.axes._subplots.AxesSubplot at 0x2146e738da0>

Comparing Daily Returns between Stocks

Create a scatter matrix plot to see the correlation between each of the stocks daily returns. This helps answer the questions of how related the car companies are. Is Tesla begin treated more as a technology company rather than a car company by the market?

In [73]:
from pandas.plotting import scatter_matrix
df5.rename(columns={'Tesla' : 'Tesla Returns', 'GM' : 'GM Returns', 'Ford' : 'Ford Returns'}, inplace=True)
scatter_matrix(df5, figsize=(8,8), hist_kwds={'bins':50})
Out[73]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7f7c89cee050>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f7c89c7d590>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f7c89ca5c90>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x7f7c89c644d0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f7c89c19cd0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f7c89bd8510>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x7f7c89b8ed10>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f7c89b4d550>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x7f7c89b580d0>]],
      dtype=object)
In [80]:
from pandas.plotting import scatter_matrix
scatter_matrix(box_df, figsize=(8,8), hist_kwds={'bins':50}, alpha=0.2);
In [219]:
 

It looks like Ford and GM do have some sort of possible relationship, let's plot just these two against eachother in scatter plot to view this more closely!

In [80]:
df5.plot.scatter(x='GM Returns', y='Ford Returns', figsize=(10,8), alpha=0.5)
Out[80]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7c895ba7d0>
In [229]:
 
Out[229]:
<matplotlib.axes._subplots.AxesSubplot at 0x21470b00128>

Cumulative Daily Returns

Great! Now we can see which stock was the most wide ranging in daily returns (you should have realized it was Tesla, our original stock price plot should have also made that obvious).

With daily cumulative returns, the question we are trying to answer is the following, if I invested $1 in the company at the beginning of the time series, how much would is be worth today? This is different than just the stock price at the current day, because it will take into account the daily returns. Keep in mind, our simple calculation here won't take into account stocks that give back a dividend. Let's look at some simple examples:

Lets us say there is a stock 'ABC' that is being actively traded on an exchange. ABC has the following prices corresponding to the dates given

Date                        Price
01/01/2018                   10
01/02/2018                   15
01/03/2018                   20
01/04/2018                   25

Daily Return : Daily return is the profit/loss made by the stock compared to the previous day. (This is what ew just calculated above). A value above one indicates profit, similarly a value below one indicates loss. It is also expressed in percentage to convey the information better. (When expressed as percentage, if the value is above 0, the stock had give you profit else loss). So for the above example the daily returns would be

Date                         Daily Return                  %Daily Return
01/01/2018                 10/10 =  1                          -   
01/02/2018                 15/10 =  3/2                       50%
01/03/2018                 20/15 =  4/3                       33%
01/04/2018                 25/20 =  5/4                       20%

Cumulative Return: While daily returns are useful, it doesn't give the investor a immediate insight into the gains he had made till date, especially if the stock is very volatile. Cumulative return is computed relative to the day investment is made. If cumulative return is above one, you are making profits else you are in loss. So for the above example cumulative gains are as follows

Date                       Cumulative Return         %Cumulative Return
01/01/2018                  10/10 =  1                         100 %   
01/02/2018                  15/10 =  3/2                       150 %
01/03/2018                  20/10 =  2                         200 %
01/04/2018                  25/10 =  5/2                       250 %

The formula for a cumulative daily return is:

$ i_i = (1+r_t) * i_{t-1} $

Here we can see we are just multiplying our previous investment at i at t-1 by 1+our percent returns. Pandas makes this very simple to calculate with its cumprod() method. Using something in the following manner:

df[daily_cumulative_return] = ( 1 + df[pct_daily_return] ).cumprod()

Create a cumulative daily return column for each car company's dataframe.

In [85]:
stk_tsla['Cumulative Return'] = (1 + ((stk_tsla['Close']/stk_tsla['Close'].shift(periods=1)-1))).cumprod()
stk_tsla.head()
Out[85]:
High Low Open Close Volume Adj Close Total Traded returns Cumulative Return
Date
2012-01-03 29.500000 27.650000 28.940001 28.080000 928100 28.080000 2.685921e+07 NaN NaN
2012-01-04 28.670000 27.500000 28.209999 27.709999 630100 27.709999 1.777512e+07 -0.013177 0.986823
2012-01-05 27.930000 26.850000 27.760000 27.120001 1005500 27.120001 2.791268e+07 -0.021292 0.965812
2012-01-06 27.790001 26.410000 27.200001 26.910000 986300 26.910000 2.682736e+07 -0.007743 0.958333
2012-01-09 27.490000 26.120001 27.000000 27.250000 897000 27.250000 2.421900e+07 0.012635 0.970442
In [81]:
# I did too much work.  The returns column was already computed so the quickest way was to use it
stk_tsla['Cumulative Return'] = (1 + stk_tsla['returns']).cumprod()
stk_tsla.head()
Out[81]:
High Low Open Close Volume Adj Close Total Traded returns Cumulative Return
Date
2012-01-03 29.500000 27.650000 28.940001 28.080000 928100 28.080000 2.685921e+07 NaN NaN
2012-01-04 28.670000 27.500000 28.209999 27.709999 630100 27.709999 1.777512e+07 -0.013177 0.986823
2012-01-05 27.930000 26.850000 27.760000 27.120001 1005500 27.120001 2.791268e+07 -0.021292 0.965812
2012-01-06 27.790001 26.410000 27.200001 26.910000 986300 26.910000 2.682736e+07 -0.007743 0.958333
2012-01-09 27.490000 26.120001 27.000000 27.250000 897000 27.250000 2.421900e+07 0.012635 0.970442
In [181]:
 
Out[181]:
Open High Low Close Volume Total Traded returns Cumulative Return
Date
2012-01-03 28.94 29.50 27.65 28.08 928052 26857824.88 NaN NaN
2012-01-04 28.21 28.67 27.50 27.71 630036 17773315.56 -0.013177 0.986823
2012-01-05 27.76 27.93 26.85 27.12 1005432 27910792.32 -0.021292 0.965812
2012-01-06 27.20 27.79 26.41 26.89 687081 18688603.20 -0.008481 0.957621
2012-01-09 27.00 27.49 26.12 27.25 896951 24217677.00 0.013388 0.970442

Now plot the Cumulative Return columns against the time series index. Which stock showed the highest return for a $1 invested? Which showed the lowest?

In [91]:
# do the cumulative return calculations for Ford & GM as well as rename the column in Tesla table in prep for join
stk_gm['GM'] = (1 + ((stk_gm['Close']/stk_gm['Close'].shift(periods=1)-1))).cumprod()
stk_f['Ford'] = (1 + ((stk_f['Close']/stk_f['Close'].shift(periods=1)-1))).cumprod()
stk_tsla.rename(columns={'Cumulative Return' : 'Tesla'}, inplace=True)
In [82]:
stk_gm['Cumulative Return'] = (1 + stk_gm['returns']).cumprod()
stk_f['Cumulative Return'] = (1 + stk_f['returns']).cumprod()
In [129]:
# this only get you a panda series - can't be used in joins
# stk_gm[('GM')]

# these produce DataFrame objects that can be used in the join
# pd.DataFrame(stk_tsla['Tesla'])
# pd.DataFrame(stk_gm['GM'])
# pd.DataFrame(stk_f['Ford'])

# assemble a list of dataframes
dfs = [pd.DataFrame(stk_tsla['Tesla']), pd.DataFrame(stk_gm['GM']), pd.DataFrame(stk_f['Ford'])]

# then join the dataframes in the list on the common indices and build a plot on top of it
dfs[0].join(dfs[1:]).plot(figsize=(16,8), title='Cumulative Return')
Out[129]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7c835c2810>
In [84]:
# again, too much work this is so much easier
stk_tsla['Cumulative Return'].plot(figsize=(16,8), label='Tesla', title='Cumulative Return')
stk_gm['Cumulative Return'].plot(label='GM')
stk_f['Cumulative Return'].plot(label='Ford')
plt.legend();
In [183]:
 
Out[183]:
<matplotlib.legend.Legend at 0x2146b9c26d8>

Great Job!

That is it for thsi very basic analysis, this concludes this half of the course, which focuses much more on learning the tools of the trade. The second half of the course is where we really dive into functionality designed for time series, quantitative analysis, algorithmic trading, and much more!